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Abstract — In this paper, we introduce a general extension of lin- 
ear sparse component analysis (SCA) approaches to postnonlin- 
ear (PNL) mixtures. In particular, and contrary to the state-of-art 
methods, our approaches use a weak sparsity source assumption: 
we look for tiny temporal zones where only one source is active. 
We investigate two nonlinear single-source confidence measures, 
using the mutual information and a local linear tangent space 
approximation (LTSA). For this latter measure, we derive two 
extensions of linear single-source measures, respectively based on 
correlation (LTSA-correlation) and eigenvalues (LTSA-PCA). A 
second novelty of our approach consists of applying functional 
data clustering techniques to the scattered observations in the 
above single-source zones, thus allowing us to accurately estimate 
them. We first study a classical approach using a B-spline approx- 
imation, and then two approaches which locally approximate the 
nonlinear functions as lines. Finally, we extend our PNL methods 
to more general nonlinear mixtures. Combining single-source 
zones and functional data clustering allows us to tackle speech 
signals, which has never been performed by other PNL-SCA 
methods. We investigate the performance of our approaches with 
simulated PNL mixtures of real speech signals. Both the mutual 
information and the LTSA-correlation measures are better-suited 
to detecting single-source zones than the LTSA-PCA measure. We 
also find local-linear-approximation-based clustering approaches 
to be more flexible and more accurate than the B-spline one. 

Index Terms — Source separation; Nonlinear system identi- 
fication; Sparse component analysis; Post-nonlinear mixtures; 
Single-source confidence measures; Functional data clustering; 
Speech. 



I. Introduction 

BLIND Source Separation (BSS) consists of estimating 
a set of N unknown source signals sj from a set of 
P observations Xi resulting from mixtures of these sources 
through unknown propagation channels [[1]. Among all the 
proposed approaches, the ones based on sources joint-sparsity, 
known under the name of Sparse Component Analysis (SCA) 
methods, have met with great interest in the community in 
the last decade (see e.g. [1, Ch. 10]). Indeed, they are nat- 
urally adapted to stationary, non-stationary and/or dependent 
signals and are thus an alternative to classical Independent 
Component Analysis (ICA) approaches which assume source 
mutual independence. Moreover, they allow processing of the 
underdetermined case where N > P. 

Most of the SCA approaches have been proposed for linear 
mixtures, i.e. linear instantaneous (LI), anechoic or convolutive 
mixtures. While many methods assume the sources to be 
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(approximately) W-disjoint orthogonal (WDO) in an analysis 
domairQ |2|, several other methods highly relax this assump- 
tion, by looking for "single-source zones" (i.e. zones where 
one source is dominant over the others) fsl-f?!. Interestingly, 
while many SCA methods have been proposed for linear 
mixtures, only a few sparsity-based methods process nonlinear 
configurations fSl-lTT]. In fS], (W], the authors consider post- 
nonlinear (PNL) mixtures (i.e. a special configuration where 
Unear mixes of sources are distorted by a function which mod- 
els data acquisition/sensor nonlinearities, such as saturation), 
and assume the sources to be approximately WDCl. Unfor- 
tunately, these approaches are not tested with real-life source 
signals, mainly because of the strong sparsity assumption. In 
flOl , ifTi 1 . the authors extend the measures for finding single- 
source zones to other classes of nonlinear mixtures but restrict 
their approach to overdetermined or determined mixtures. 

In thispaper, we propose an approach for identifying PNL 
mixture^ based on single-source zones, as in ifTol . lITTI . and 
which possibly processes the underdetermined case, as in |l8l, 
|9|. We thus avoid the strong source sparsity assumption of ||8l, 
i9l while processing the same class of mixtures and applying 
our approach to mixtures of real speech signals. Our main con- 
tribution is dedicated to the estimation of nonlinear mappings, 
by combining single-source zones (found using confidence 
measures well-suited to nonlinear mixtures) and functional 
data clustering. We thus provide a way to extend linear SCA 
Il3l-|l7l to PNL-SCA. This work has been partially proposed 
in |fT2l. However, here we extend fl^ in several ways: we 
propose several single-source confidence measures well-suited 
to PNL mixtures and several methods to cluster the functional 
data points. Moreover, we present an exhaustive experimental 
validation of the approaches. An extension of the proposed 
approaches to more general mixtures, partially proposed in 
|13|, is also investigated in this paper In particular, here we 
give a better characterization of the achieved performance than 
in m. 

The remainder of the paper is structured as follows: in 
Section nil we describe the considered BSS problem. We then 
introduce our proposed method in Sections |lll] |IV] and FVl 
SectionlVllprovides an experimental validation of the approach 
and we conclude and discuss future work in Section IVIII 
Appendix |A] introduces the extensions of the proposed PNL 



The WDO assumption means that in each atom of an analysis domain (e.g. 
time, time-frequency, time-scale domain), at most one source is non-zero. 

-Actually, in |8|, the authors assume the sources to be (P — l)-sparse, 
which is equivalent to WDO if P = 2. In (9|, the approximate WDO is not 
explicitly assumed but is needed by authors and satisfied in their tests. 

^An extension to other nonhnear mixtures is provided in Appendix IaI 
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Fig. 1. PNL Mixing-separating structure. 

approaches to more general nonlinear mappings. 

II. Problem statement, definitions and 

ASSUMPTIONS 

In this paper, we assume that N real source signals s(t) = 
[si(t), . . . , sjv(<)]"^ are mixed by an LI unknown P x N 
mixing matrix A, thus providing a set of linearly mixed signals 



z{t) = As{t), 



(1) 



to which a nonlinear componentwise vector mapping / — 
[fi, . . . , /p]^, assumed to be invertible, is applied. It can e.g. 
model data acquisition/sensor nonlinearities such as saturation. 
Such a situation e.g. arises in audio processing with small and 
cheap microphones in a mobile device. Observed signals x{t) 
thus read 

xit) ^ nm) ^ HMt)) ■ (2) 

We aim to estimate the source signals s{t), up to a scale 
coefficient/permutation indeterminacy. This means that we 
want to suppress or minimize the distortions introduced by the 
nonlinear mappings /,;. For that purpose, we use a separating 
structure which is the mirror of the mixing one (see e.g. 
im Ch. 14]): we first have to estimate gi, the inverse of the 
nonlinear mappings fi, and to apply them to the observations. 
We then obtain a linear problem comparable to (H) and we 
process a linear SCA approach to estimate the sources. The 
global mixing and separating structure is shown in Fig. [T] The 
proposed separating structure may be summarized as follows: 

1) We first look for temporal zones where one source is 
dominant over the others (see Section HIH i. 

2) We then estimate the nonlinear mappings f,; (see Section 

m- 

3) We then invert the nonlinearities and get an LI-BSS 
problem, that we solve using an LI-SCA approach (see 
Section |V]i. 

Before introducing the proposed approach, we first intro- 
duce the only assumptions of the proposed approach and their 
associated definitions. 

Definition 1 ( [8J): Let A — [uij] be an PxN matrix. Then 
A is said to be "mixing" if A has at least two nonzero entries 
in each row. And A is said to be "absolutely degenerate" if 
there are two normalised columns aik and an with k / / such 
that \aik\ = \aii\, i.e. a^k and an differ only by the sign of the 
entries. 

Assumption 1 (Mixing assumptions): 

1) A has nonzero entries on the first row and at least one 
nonzero entry in each other row. We cannot find two 
coUinear vectors \aik\ and \aii\, with k ^ I. 



2) In the underdetermined case when N > P, every PxP- 
submatrix of A is invertible. 

3) We also assume that, for each index i, fi{0) — 0. 

Assumption[T]l is needed for the following reasons [8] : if A is 
not "mixing" (according to Definition [TJ, then this means that 
there is an index i such that the i-th row of A contains only one 
non-zero element and consequently /, cannot be identified. As 
an extreme case, let us imagine that A is diagonal (up to a 
permutation in the order of its columns), then each observed 
signal reads 

Xi{t) = fi{aikSk{t)), (3) 

i.e. we already get separated signals with which we can do 
nothing more without extra information. If A is absolutely 
degenerate, it can be estimated, but the nonlinear mappings 
cannot jS). Assumption [T|2 is a classical assumption in under- 
determined BSS. This means that locally, if only P sources are 
active, we get a determined BSS problem which needs to be 
separable. Lastly, Assumption [T] 3 is not limiting for practical 
applications and is shared by many PNL-BSS approaches. 

Definition 2: A "temporal analysis zone" is a subset T of 
the time domain. 

From a theoretical point of view, each temporal analysis zone 
may be set to any kind of subset of the time domain, and 
might even be restricted to a single time instance t. However, 
in practice, we set these zones to temporal intervals, denoted 

r. 

Definition 3: A source is said to be "isolated" in a temporal 
analysis zone T if only this source (among all considered 
mixed sources) has a nonzero variance in this zone. We then 
say that this zone is "single-source". 

This definition corresponds to the theoretical point of view. 
From a practical point of view, this means that the energy of 
all other sources is negligible with respect to the energy of the 
source which is isolated. 

Definition 4: A source is said to be "accessible" in the time 
domain if there exists at least one temporal analysis zone 
where it is isolated. 

Assumption 2 (Source assumptions): 

1) Source signals are mutually independent. 

2) At least P sources are accessible in the time domain. 

3) By considering several single-source zones associated 
with the same source, the amplitude of the observations 
spans a "wide" range allowing the estimation of the 
nonlinear functions . 

Note that, contrary to linear BSS methods ||3l, ||6] which 
needed source linear independence, here we need source 
mutual independence. This is due to the more complex mixing 
model, as we will see in Section |III] We need P isolated 
sources in order to be able to invert the P nonlinear functions 
fi 111- Assumption |2] 3 is needed because we want to estimate 
the nonlinear mappings fi on their whole domains. In the case 
where we would be able to estimate the functions fi on a 
subset of their domain, the whole estimation might be coarse, 
thus yielding a poor quality of separation. 
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Fig. 2. Scatter plots of observations on a single-source zone and theoretical 
curves. Left: LI mixtures. Right: PNL mixtures. 



III. Nonlinear single-source conhdence measures 

Before introducing the proposed method, let us focus on the 
intuitive idea behind it. If a source, say is isolated, then 

x^{t) ^ f, {a,kSk{t}) , Vz e {!,..., F}, G r, (4) 
and we obtain, assuming that a^fc 7^ 0, 

sk{t) = ^' '(^'W) ^ {!,..., P},Vier. (5) 

We thus have the following relationship between observations 
xi and Xi for any t ^ T: 

X^{t) = h f — {^t))] = (6) 



where the functions 



are defined as: 



</>ife(w) = fi{—fi^ (u) 
aik 



(7) 



The right plot of Fig. |2] shows an example with P = 2 
observations of the behaviour of data points x(t) in a single- 
source zone. 

In an LI problem, the relationships (|6j between observations 
are much simpler Let us first recall that in this case, the 
mapping / in Eq. ^ is set to the identity function, i.e. the 
observations x(t) are equal to the signals defined in 
Eq. ([T]). In the case when a source, say Sk is isolated, then 
observations read: 



x^it) = a^kSkit), Vi e {1, . . . , P}, yt e r, 



(8) 



and we get the following relationship between observations xi 
and Xi (see the left plot of Fig. |2] for an example with P = 2 
observations): 

a,kXi{t) - alkx^it) = 0, e {!,..., P}, Vie r. (9) 

In f3l-f7l, authors proposed finding such single-source zones 
by means of a "single-source confidence measure" based on 
asymmetrical [3] or symmetrical H ratios of observations, 
correlation ['51, fSl, and local PC A fl]. In this paper, we need 
to extend such measures to the considered PNL mixture, where 
there is a functional relationship between observations. 



A. Mutual information as a nonlinear correlation measure 

We explained above how authors provided ways to find 
single-source zones when linear mixtures are applied to the 
sources. For example, in the case where we estimate the 
correlation coefficient of a pair of observations [5 1, f6l, this 
coefficient is equal to 1 in absolute value when one source is 
isolated and is much lower otherwise. In the considered PNL 



mixture, one needs to measure a nonlinear correlation between 
observations. Mutual Information X{x), defined as: 



I(x) = -E <^ log 



P,(x) 



(10) 



where E{.} stands for expectation, P^, and P^^ (i G 
{1, . . . , P}) are the joint and marginal probability density 
functions of the observations respectively, provides such a 
measure fl4l: it takes null values when variables are indepen- 
dent and much higher values otherwise. However, if we want 
this measure to have the same behavior as linear correlation, 
we need to normalize it, which is classically done as follows: 



X, 



norm 



[x) = v/l-e-2 2:(3L). 



(11) 



This measure has also been used in 111] for another class of 
nonlinear mixtures, and we use it in the same way as [il Ij : 
a source is isolated in an analysis zone iff 2norm(s.) = 1- 
We thus only consider the analysis zones which maximize 
Eq. (HB- 



B. Manifold learning based measures 

In contrast to mutual information, as we assume the NL 
mappings fi to be smooth, the resulting functions are 
also smooth and one may locally consider them as linear 
Such an idea is quite common in manifold learning |15 | and 
allows us to extend linear single-source approaches to PNL 
configurations. 

For example, the Local Tangent Space Approximation 
(LTSA) lini Sect. 3.2.4] method learns the manifold by con- 
structing the local tangent space of each observed data point. 
We propose using such an idea to extend linear single-source 
confidence measures |[3l-||7| to PNL mixtures. Our approach 
consists of successively considering each sample x_{ti) of 
the analysis zone T, defining its neighborhood (estimated 
by means of a if -nearest neighbor (if-NN) technique), and 
applying a linear single-source confidence measure in this 
neighborhood. 

As in 121, we realize an eigendecomposition of the correla- 
tion matrix of the data. If one source is isolated, then the rank 
of the observations is equal to 1 and the highest eigenvalue 
Xi{ti) is non-negligible while the K — 1 other ones Xj{ti) 
are close to zero. The authors of fT\ proposed computing the 
ratio between the highest eigenvalue and the sum of the others 
to find single-source zones. In this paper — in order to keep 
an analogy with the behavior of the correlation — we propose 
computing the ratic0: 



(12) 



which is close to 1 in single-source zones and much lower 
otherwise. Once we compute the ratios R{ti) for all the data 
points observed at time ti G T, we derive the actual global 
single-source confidence measure, denoted TZ{x) hereafter, as 

^Note that such idea has been proposed in 1161 for selecting "simple 
autoterms" of bilinear time-frequency transforms. 
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the geometric mean of all these ratios: 

= ( n ^(*') 



(13) 



\uer 



where |T| is the cardinality of the zone T, i.e. the number 
of time samples it contains. The extension of other single- 
source measures using this framework is straightforward. In 
lUl and 161, the authors propose computing the covariance 
or the correlation coefficient between observations. Here, we 
compute the correlation coefficient between observations 1 and 
j, in the neighborhood of observed point x{ti) at time ti. 

¥.{xi{ti)xj(U)] 



y^E{xi{U)xi{U)}E{xj{U)xj{U)}' 



(14) 



we average this over the indices of j (we then denote it 
Cxiti)) and we derive the global single-source confidence 
measure which is well-suited to PNL mixtures, denoted by 
C{x) hereafter: 



(15) 



\ueT 



Note that other linear single-source confidence measures 
might be extended as well, as in the work in [31 and |4| for 
example. 

C. Constant sources and extended single-source confidence 
measures 

So far, we have proposed single-source confidence measures 
assuming that the energy of the absent sources is very low. A 
problem may appear if, in a zone 7", sources Sj{t) are constant 
but not nulO. We denote these constant values by sy. In this 
situation, we still have our above single-source confidence 
measures equal to 1 but Eq. (|4|i then becomes 

x^{t) = h (a^kSkit) +a,,k{T)), e {1, . . . ,P}, Vt G r 

(16) 

where ai^k{T) = 'l2j^k ^ij'^- ^1- © ^^d (|6]i then resp. read 

/ri(a:.W)-a.,fc(r) 



Sk{t) = 

and 



Vie{i,...,P}, (17) 



x^it) = fA — (/r' (^i(i))) - — + a.,k{r) 

\aik aik 

(18) 

Let us recall that we are looking for zones where all the 
constant coefficients ai^k{T) are zero. As we are applying 
the proposed approach to speech signals, the situation when 
one can find an index i such that ai^k{T) ^ will not occur. 
Additionally, due to Assumption [T]3, we know the value of 
each nonlinear function is zero at zero and we can estimate 

^Such a scenario is not a problem in LI-SCA: observations can be locally 
centered in each analysis zone, thus zeroing the constant signals | 5 1. Moreover, 
such constant sources provide the main difference between our proposed 
method and 111]: we are looking for zones where all the sources are zero 
except one while the authors of (111 are looking for zones where all the 
sources are strictly positive constant except one. 



a-ik 



(pik, the nonlinear relationship between observations defined 
in Eq. (|7J (see Section |IV| ) and discard the zones where 

Finally, from a theoretical point of view, we should look for 
analysis zones which satisfy: 



SSCM(2) 

^^k{0) 



1 



Vie {2,...,P}, 



(19) 



where SSCM(x) is one of the single-source confidence mea- 
sures defined in Eqs. ( fTTT i. ( fT3] l. and ( fTSl l. However, in practice, 
we only consider zones which approximately satisfy the cri- 
terion ( fT9] i. i.e. we look for zones T such that: 



J SSCM(a;) > 1 - ei 

I \^k{0)\ < £2 Vie{2,...,P} 
where ci and 62 are some user-defined thresholds. 



(20) 



IV. Functional data clustering 

If we consider all the single-source analysis zones, then we 
get a subset of the original observations where the approximate 
WDO assumption holds. We can thus use the clustering 
techniques proposed in [8J, [9J. However, in |8|, the authors 
propose a geometrical preprocessing which is not robust to 
noise in general and in particular not to non-ideal single-source 
zone^ On the other hand, [9J proposes the use of a spectral 
clustering technique in order to separate the curves, and thus 
the sources. Spectral clustering techniques are well suited to 
nonlinearities in the data and are more robust to noise than 
the approach proposed in ||8]. However, such techniques are 
sensitive to the distance between the curves and do not allow 
the clusters to overlap. This last criterion is obviously not 
satisfied in the BSS framework, at least around zero where all 
the clusters meet, and the authors of f9l proposed a solution 
for this last case: they remove the points of x which are close 
to zero, and try to find 2N clusters. By assuming that the 
nonlinear mappings are almost linear for the lowest values of 
X, they link the half-clusters. 

In this paper, we propose taking advantage of the single- 
source analysis zones to cluster our data. Indeed, in each 
single-source zone, as we saw above (see Fig. |2]i, all the 
points belong to the same functional curve and give us extra 
information which is not provided in fSl, [9|. We can thus 
cluster the data according to these zones. The underlying idea 
can be seen as an extension of scale-coefficients clustering in 
Q to nonlinear mixtures: while the linear relationships 
between observations were limited to scale coefficients to be 
clustered, here we have to cluster the scattered curves observed 
in the single-source zones, i.e. to estimate some parameters 
adequately describing the functions (f)ik defined in Eq. (|7]) 
to realize a cluster Such techniques are named functional 
data clustering, and belong to a rich topic in mathematics, 
named functional data analysis, which aims to study infinite 
dimensional data as functions flTl . 

Many approaches for clustering functional data belong to 
one of both following families, (i) The regularization ap- 

^An ideal single-source zone is an analysis zone where all the sources 
except one are exactly zero. 
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proaches consist of successively interpolating each observed 
scattered function, of discretizing all of them on the same 
time grid and lastly clustering them while considered as a high 
dimensional vectors. However, they are often highly correlated 
and may lead to unstable solutions, because of the curse 
of dimensionality [, 1 8 1 . (ii) The filtering methods consist of 
approximating each curve with respect to a common finite 
dimensional basis, and then of clustering the resulting basis 
coefficients. 

A description of a simple filtering approach, using B-splines 
is provided in the following subsection while we derive an 
original clustering method from our considered PNL problem 
in Section HV^ 

A. Filtering functional data clustering 

Given an interval [xi{ti,)^xi{te)\, we define a subdivision 
Co = < Ci < C2 < ... < Ck < Ca'+i = xi{t,). 

The points Ci are named knots. Note that a same knot may be 
repeated several times, say p times. We then say that it is a 
multiple knot of multiplicity order p. We aim to fit the curve 
a;i(tj)}j=i,...,A/ on such an interval by using splines. 
A spline is a polynomial of degree d (or order d + 1) on any 
interval [Ci-i, Ci) which has d+l continuous derivatives on the 
open interval [xi(ib), xi(ie)). For a fixed sequence of knots, 
the set of such splines is a linear space of functions with K + 
d + 1 free parameters. A useful basis {Bi,d, ■ ■ ■ , BK+d+i,d) 
for this linear space is given by B-splines [l20l, recursively 
defined as 



Bi,u{t) 



1 if Ci < < < Ci+i 
otherwise ' 



■Bi+i^u-i{t)- 
(21) 

A spline, denoted C{xi,(3) hereafter, can now be written with 
respect to the above basis: 

K+d+l 

CixuP)^ l3iBiAxi), (22) 

1=1 

where /3 = [/3i, . . . , Px+d+i]'^ are the spline coefficients. For 
a set of fixed knots, the coefficients /3 may be estimated as 
a linear problem. Once the basis is estimated according to 
Eq. the application of the B-splines is not more difficult 
than polynomial regression. Let {{xi{tj),Xi{tj))}j=i^,,,^M be 
a regression type data set of AI measurements of the curve 
4>ik defined in Eq. (|7]i. The spline coefficients are estimated 



1 

j3i = argmin — ^ {x,{t,) - CiMtj),^)^ ■ (23) 

Once the knots are fixed, the estimated coefficients /3i describe 
the function shape. If we use the same knots for all the 
single-source analysis zones selected in Section |III] then all 
the K + d+l B-spline coefficients have the same meaning 
and may be compared. If two curves have close estimated 
B-splines coefficients, then they should be associated with 
the same source. Otherwise, they should be associated with 
different sources. Clustering techniques can be applied to such 



coefficients lfT9l : while the authors of lfT9l used K-means to 
this end, we propose to use the median-based version of K- 
means, named K-medians which has been used in |6|. Other 
approaches, such as DEMIX |7 1, may also be employed. Let us 
recall that prior to the clustering stage, we discard the single- 
source zones T which do not satisfy Eq. ( fT9] l. In practice, 
(j>ik{0) is estimated as the value of the spline C{xi,l3) at zero. 

B. Clustering using locally-linear assumptions 

In the previous section, we presented a clustering method 
developed for functional data. However, it suffers from the fact 
that finding optimal knot locations for the set of scattered data 
is data-dependent. Moreover, there are relationships, named 
Schoenberg- Whitney conditions, between the degree of the B- 
splines, the knot locations and the number of points between 
two knots to be satisfied |21j. 

We now propose an alternative solution based on the lo- 
cally linear data approximation around zero. Indeed, in many 
systems where PNL mixtures hold, the NL function due to 
the sensor response, e.g. microphones, is almost linear around 
zero. The first order of the Taylor series expansion of the 
functions (j>ik reads 

^^k{t) - <f>^k{0) + ^[kiO) ■ t + 0{t^) ^ 0',,(O) . t. (24) 

Eq. (I24I 1 thus reveals that the scattered functional curve is 
approximated by the slope of its tangent at zero and we can 
use this slope as a way to cluster the estimated functions. 
Moreover, speech signals (that we aim to process in this paper) 
tend to be distributed around zero in single-source zones T 
^nd we get a high probability for having many data points 
in the neighborhood of zero in each single-source zone T- 
Estimating (p'iki^) be done by LI-SCA, since Eq. (HI 

combined with Eq. ( |24] | is equivalent to Eq.®. As in Section 
Hm we propose using the formalism of manifold learning, and 
estimating the neighborhood of with the iiT-NN method, 
before applying one of the methods in Q, ifTOl to estimate 
the slope of 4>ik{0)- As explained in Section UlI-CI we discard 
single-source zones T which do not satisfy Eq. ( fT9l ). Note that 
<j>ik{0) is the value at the origin of the line defined in Eq. ( l24l l. 
In practice, we estimate it using a least-mean square regression 
technique. We finally cluster the retained curves by applying 
a clustering method, e.g. K-medians, on these slopes. 

V. Nonlinear inversion and linear identification 

Once the nonlinear functions are estimated, we have to 
invert them and apply these inverse functions to the ob- 
servations x{t), in order to get the linearly mixed signals 
e{t) = [ei{t), . . . , ep{t)]'^ (see Fig. [Til. This is straightforward 
by e.g. applying one of the neural-network-based methods, 
proposed in Js], ||9], which use the same property but are 
differently implemented. The underlying common property 
was first defined for PNL-ICA methods yj Ch. 14] and is 
adapted to PNL-SCA as follows: we estimate a nonlinear 
mapping g = [gi, . . . , gpf" such that for all indices i, k, the 
compound function o cji^k is linear (see Fig. [T]i. To this end, 
IS) proposes finding a linear relationship between the same 
components of different clusters while ||9] suggests finding a 
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linear relationship between different components in the same 
cluster. 

Once the nonlinearities are inverted, we thus obtain a 
classical LI-SCA problem. The estimation of the linear mixing 
parameters is then straightforward if we have estimated the N 
nonlinear curves: once we have linearized the clusters obtained 
in Section |IV| each cluster fits a line whose parameters — 
defined in Eq. ^ — may be estimated using a criterion pro- 
posed in |i3J-[7|. 

If we estimated more than P curves but less than N, we 
are still able to invert the nonlinearities. However, we now 
do not have all the linearized curves and we will thus have 
to estimate the linear mixing parameters thanks to a whole 
linear SCA approach, and probably by first applying a linear 
sparsifying transform to e{t), in order to find a zone associated 
with each source. 

Note that so far, we have mainly focused on the first 
stages of the complete approaches, which contain our proposed 
novel criteria and for which the performance is investigated in 
Section IVT] Testing the inversion is outside the scope of the 
paper. 

VI. Tests 

In this section, we test the performance of our proposed 
approaches on PNL mixtures of speech. Indeed — and contrary 
to ||8l, |l9l — we test our approach on simulations using real 
speech signals which can be locally sparse in the time domain, 
due to silence of speakers 15|. 

Before investigating the performance of each criterion used 
in each stage of the approaches, we illustrate the behaviour of 
the proposed methods with an example of iV = 3 sources and 
P = 2 sensors, i.e. an underdetermined mixture. The source 
signals are three speech signals, which are sampled at 20 kHz, 
last 5 s, and contain silent parts. These sources are presented 
in Fig. |3] 

We mixed them with the following mixing matrix: 



A = 



1 

-0.9 



1 

0.5 



0.9 
1 



(25) 



and then applied the following nonlinear mappings, proposed 
in m, 

fi{t) = tanh(i)+t 
/2(t) = tanh(lOt) 



(26) 



to the resulting signals z{t). Note that the mixing matrix 
A is close to being an absolutely degenerate matrix, and 
thus the configuration under consideration is challenging. 
Moreover, the nonlinear functions have been chosen so that 
they can model audio effects like soft-clipping. Observations 
are shown in Fig. [3] and one may see the strong nonlinearities 
in Observation X2{t). 

We set the size of our temporal analysis zones to 100 
samples. Mutual information is estimated using the approach 
in II22I . Fig. [3] shows the plot of speech sources and the 
obtained normalized mutual information measures. We can see 
that 2norm(2l) is close to 1 when one source is isolated. We 
then considered all the zones where ei and ^2, defined in Eq. 
( fT9] l, are set to 0.01 and 0.1, respectively. Fig. |4] shows two 
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Fig. 3. Normalised mutual information measures (bottom plot on the right 
part of the figure) between two PNL mixtures (upper plots on the right part 
of the figure) of three speech sources (plotted on the left part of the figure). 
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Fig. 4. Scatter plots between observations. Top: on the full signals. Bottom: 
on the single-source analysis zones. 



scatter plots: on the top, we provide the scatter plot of the 
original observations. It is clear that the sparsity assumptions 
needed in IS), |l9l are not satisfied at all. In the bottom plot, 
we only show the scatter plots obtained from zones satisfying 
Eq. (fT9] l. Here, we can see three curves associated with 
nonlinearities. This thus shows the relevance of the single- 
source confidence measures and an easy way to improve the 
work of [8], ||9l- We then estimated the different curves on the 
local scatter plots using B-spline approximations. Because the 
choice of the knots is data-dependent, we decided to perform 
a "coarse" fitting, i.e. an approximation whose knot locations 
and B-spIine order are not necessarily optimised but that allow 
us to separate the curves of the functions defined in Eq. 
(I7]l. In the example provided here, we used the following knots 



Vi e {0,...,6}, 6 = -1-5 + 0.5- i, 



(27) 



without multiplicity order and knot-end conditions. We set the 
degree of the B-spline to 4, in order to obtain smooth estimates 
of the curves. We then obtained the B-spline coefficients (5 that 
we then clustered using K-medians. Fig.|5]shows the separated 
curves obtained after classification, i.e. the superposition of the 
local scatter plots on the zones belonging to the same cluster 
Such separated curves then allowed us to estimate the inverse 
nonlinear mappings. 

This inversion can be done by e.g. applying one of the 
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x{(t) 



Fig. 5. Scatter plots of the separated curves after the clustering procedure 
(in black) and estimated nonlinear functions (in gray). 



linearizing methods proposed in ||8l, This linearization 
succeeds if the nonlinear mappings are estimated with ac- 
curacy, which is the goal of this paper We measured this 
accuracy by computing B-splines functions of order 4, with 20 
knots. The obtained curves, drawn in Fig. |5] fit the scatter plots 
very well and are really close to the theoretical ones, as the 
mean square errors (MSEs) on the sampled curves are equal 
to 2.5e-4, 5.3e-5, and 2.1e-5.We now propose to characterize 
the performance of each proposed criterion for both finding 
single-source analysis zones and clustering the functional data. 

A. Comparison of the single-source confidence measures 

In this section, we test the performance of the proposed 
single-source confidence measures. To do so, we propose 
an original experiment which consists of generating linear 
instantaneous mixtures z{t) as defined in Eq. ([TJ, their PNL 
versions x{t) defined in Eq. (|2|i, and compute the linear and 
nonlinear single-source confidence measures over zit) and 
x{t), respectively, in the same zones T- These tests will allow 
us to derive some particular statistics of the behaviour of each 
considered single-source confidence measure. We hereafter 
denote the considered nonlinear single-source confidence mea- 
sure by SSCM(x), and the associated linear one by SSCM(z). 
From now on we consider that a zone is selected as single- 
source in the linear mixtures (respectively PNL mixtures) 
if SSCM(z) > (respectively SSCM(£) > rjx), where 
we refer to the thresholds associated with the single-source 
confidence measures for PNL and LI mixtures as 77^ and rj^, 
respectively. On the contrary, when these measures are below 
these thresholds, we consider the zone T to be multiple-source. 

We then define (i) TP (77^, 77^) the true positive cases of 
zones T detected as single-source in the linear mixtures and 
in the PNL ones, (ii) FN (ri^^riz), the false negative cases of 
zones detected as single-source in the linear mixtures but not 
in the PNL ones, (iii) FP (77^, 77^) the false positive cases of 
zones detected as single-source in the PNL mixtures but not in 
the linear ones, and (iv) TN (7;^, rjz) , the true negative cases 
where the zones T are not detected as single-source neither 
in the linear mixtures nor in the PNL ones. 

From these cases, we derive the sensitivity and the speci- 
ficity as: 



Sensitivity (77^:, 77^) 



TP (Vx, Vz 



Specificity (77^, 77^) = 



TN {tj,, 77,) 



FP {Vx, %) + TN (7/, , 77, 



(29) 



These quantities may be analyzed as follows. The sensitivity 
may be seen as the probability of correctly detecting single- 
source zones in PNL mixtures. When this probability is low, 
this means that we could discard the zones T that would 
be seen as single-source by the linear measures, i.e. some 
single-source zones are "invisible". This might not affect the 
global performance of the method a lot if the total number of 
single-source zones is high but it might affect it if there are 
few single-source zone^. The specificity may be seen as the 
probability of correctly discarding multiple-source zones. If 
this probability is low, this means that we might detect zones 
T as single-source when they are not, thus yielding inaccurate 
estimation of the nonlinear functions in the next stages of the 
proposed approach. In our considered PNL problem, a low 
specificity is much more harmful than a low sensitivity and 
must be avoided as much as possible. 

We generate 252 PNL mixtures: we consider 28 pairs of 
N ^ 2 speech sources, which last 5 s, include silent parts, 
and are sampled at 20 kHz, that we mix with the following 
P X P mixing matrices: 



A 



1 A 

-A 1 



(30) 



with A = 0.9, 0.5, and O.L We then apply one of the following 
sets of nonlinear functions: the one in Eq. ( |26] | tested in lilH . 
or 

fi{t) = tanh(i) +0.1 -t 
f2{t) = t 

which has been tested in fS], and 



(31) 



/iW = /2W=tanh(i), 



(32) 



TP (77,, 77,) + FN (77,, 77,)' 



(28) 



tested in |l9l. 

The cardinality of the zones T is set to 100 and the temporal 
overlap between the zones is set to 10%, thus generating 1111 
analysis zones per PNL mixtures. As in the underdetermined 
illustrative example described at the beginning of this section, 
mutual information is estimated using the approach in (22\. 
The eigenvalue decomposition is processed through a classi- 
cal PC A 170. As the locally-linear approaches described in 
Section IIII-BI use a K-NN method for estimating the linear 
subspace, we vary K on {5, 10, 15, 20}. 

Figure |6] shows the sensitivity and the specificity for all the 
tested single-source confidence measures, for thresholds 7/^ 
and j]x ranging from to 0.99 with a step-size of 0.01, except 
for the LTS A-PCA measure^ where the thresholds range from 
0.5 to 0.99. The plots may be analyzed as follows. The mutual 
information provides almost the same measures for both the 



^Indeed, if there are few and a non-negligible part of them cannot be found, 
then we estimate the linear functions with an extremely few zones which might 
result in inaccurate estimation of the nonlinear functions. 

*We also tested an alternative eigenvalue decomposition using a Schur 
decomposition, as proposed in 1 15 1. However, we did not notice any major 
difference between this decomposition and the PCA-based one 17]. 

'indeed, the LTSA-PCA measure and its linear version are based on the 
ratio of eigenvalues. In the case of P = 2 observations, the lowest eigenvalue 
is between and 1, hence the range of the ratio which goes froin 0.5 to 1. 
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Fig. 6. Sensitivity and specificity, with respect to ttie value of rj^ and rjx, for tlie tested single-source confidence measures. White (respectively black) 
color means that the measure is equal to 1 (respectively 0). (a) Normalized mutual information, (b)-(e) LTSA-PCA with K = 5, 10, 15, and 20. (f)-(i) 
LTSA-Correlation with K = 5, 10, 15, 20. 



LI and PNL mixtures, hence the symmetry which appears in 
the plots in Fig. |6al Figures l6bl46el highlight different aspects 
of the LTSA-PCA measure. We notice that the measures are 
sensitive to the choice of K. For example, the sensitivity 
for high values of rj^ — typically rjx > 0.9 — decreases when 
K increases. Similarly, the area of high values of specificity 
increases with K. Figures |63 - I6ll illustrate a different behaviour 
of the LTSA-Correlation measure. For example, these plots 
show that this measure is less sensitive to the choice of K 
than the previous LTSA-PCA. In particular, the specificity is 
much higher for a much wider range of values of thresholds, 
thus showing that the risks of keeping "false" single-source 
zones with the LTSA-Correlation is lower than with the LTSA- 
PCA. However, the specificity is lower as well, thus meaning 
that the LTSA-Correlation discards more "real" single-source 
zones than the LTSA-PCA. However, and as discussed above, 
this last aspect is less detrimental than the previous one. 
To conclude, the normalized mutual information and the 
correlation-based confidence measure should be preferred and 
the eigendecomposition-based measure should be avoided. In 
the rest of the paper, we only use the mutual information to 
find single-source zones. 

B. Comparison of the functional data clustering approaches 

We now compare the performance of the functional clus- 
tering techniques. For that, we use the same protocol as in 
Section IVI-AI We considered the same sets of sources, the 
same mixing matrices and the same nonlinear functions. We 



selected the single-source zones by estimating the mutual 
information between observations. The threshold ei in Eq. 
(O was set to 0.01 while the threshold £2 was se0 to 0.05. 
The final estimation of the nonlinear mappings is obtained by 
computing a B-spline of order 4 with 205 knots. Let us note 
that, when this last measure is quite high — say in the order of 
10^^ — it does not necessarily mean that the clustering stage 
was not successful, but that the estimation of the nonlinear 
mappings is not accurate enough. A better approximation may 
be obtained with other approaches (e.g. with another B-spline 
order or with kernel regression). However, this measure allows 
us to compare the performance obtained with the proposed 
clustering methods. 

We aimed to measure the influence of the mixing parameters 
on the global quality of estimation of the nonlinear mappings, 
and the influence of the parameters to be in the clustering 
methods. The filtering clustering method uses a B-spline 
approximation, for which we need to fix the knot locations and 
the order We proposed to test the approach with evenly-spaced 
knots between -L5 and 1.5, with a step-size of 0.5 or 0.3. 
Additional tests — not reported here for space considerations — 
were performed with evenly-spaced knots ranging from -2 and 
2, with a step-size of 1, 0.5, or 0.25. However, these exper- 
iments yielded to a lower performance than those reported 
in this paper For each of these tests, the order of the spline 

'"in some preliminary tests, we found that the value of this threshold had 
almost no effect on the global performance of the clustering methods. This 
was expected, as explained in Section. IIV-BI 
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varied from 3 to 6. In each experiment, when the Schoenberg- 
Whitney condition was not satisfied for a given single-source 
zone T, this zone was discarded. The approaches using the 
linear approximation around zero use a if-NN technique. We 
set the value of K to vary as 5, 10, 15, and 20. 

The average MSEs and their associated standard deviations 
are given in Table |T] We first noticed that the performance of 
the clustering approach using B-splines highly depends on the 
choice of the knot locations. In particular — and this result is 
not evident from the table — we noticed that the MSEs can be 
very low with one of the three tested sets of functions but 
really high for another one. Even if we may get very accurate 
results, as we showed in lfT2l . lfT3l . we never found a set 
of knots and a spline order yielding very accurate estimates 
for all the tested nonlinear mappings. The best performance 
was obtained with a step between knots equal to 0.5, and 
a spline of order 4 or 5. However, this performance is much 
lower than the one obtained with the other proposed clustering 
methods. Indeed, the Unear approximation-based techniques 
yield accurate estimates, except when 7 = 0.1, which is 
consistent with the performance of other PNL-ICA methods 
(see m Ch. 14]) and the one obtained in IS) for an LI-SCA 
method. For the other tested mixing matrices, the accuracy 
of clustering and estimation of the nonlinear mappings is 
really good, which shows that these methods are more flexible. 
Moreover, the approaches do not seem to be very sensitive to 
the choice of K as the performance does not change a lot, 
except when K = 5 where the MSEs are higher with the 
other tested values of K. This shows the relevance and the 
flexibility of the proposed methods. 



VII. Conclusion and future work 

In this paper, we introduced several PNL mixture iden- 
tification methods which use weak sparsity assumptions in 
order to estimate the mixing parameters. The main nov- 
elty of the proposed method is that we combine single- 
source confidence measures with functional data clustering 
techniques. The proposed approaches thus improve on the 
previously proposed PNL-SCA methods which assume strong 
joint-sparsity assumptions and cannot be applied to speech 
and audio signals. We conducted several experiments showing 
the performance and the relevance of our approaches. In 
future work, we will propose an approach for inverting the 
nonlinearities. Indeed, here we focused on the estimation of the 
nonlinear functions, whose accuracy has a direct consequence 
on the final separation. Inversion approaches presented in [8], 
10 can easily be used with our proposed approaches but we 
will propose an alternative to them and will compare their 
respective performance. Another future direction will consist 
of investigating sparsifying transforms well-suited to nonlinear 
mixtures. Indeed, while the proposed approach may be applied 
to speech signals, the required sparsity assumption is not met 
with music signals. Lastly, we will extend our approach to 
other nonlinear mixtures, like PNL convolutive mixtures. 



Appendix A 
General nonlinear mixtures 

The methods we propose in this paper, for both finding 
single-source zones and clustering the scattered functional 
data, may be applied to more general mixtures than the PNL 
ones, as we will now see. 

We assume that N real source signals s(t) — 
[si{t), . . . , SAr(t)]"^ are mixed by an unknown instantaneous 
nonlinear mapping A from K^. Observed signals then 

read: 

xit)=A{s{t)). (33) 

This mixing model is extremely general and it is well known 
m Ch. 14] that it cannot be solved by only assuming source 
mutual independence. Assumption [T| may now be rewritten in 
this new framework as follows. 

Assumption 3: (i) The nonlinear mapping A is smooth, (ii) 
we assume we know the value of A for one value Uq e M^, 
and in particular, without loss of generality, we assume that 
Uq = and that 

AiO) = 0. (34) 

Lastly, (iii) A may be completely estimated by its values in 
single-source zones T: 

x,{t) = A UW) ^Akiskit)),yie{i,...,P}, (35) 

where Aik is an invertible nonlinear function from M — > R. 
Assumptions [3](i) and|3](iii) are needed in order to interpolate 
A from Aik- Assumption |3](ii) is needed to suppress the 
ambiguities that may appear in the selection of single-source 
zones, as we faced with PNL mixtures in Section IIII-CI 
Note that Assumption [3](iii) allows us to tackle many NL 
configurations, as we will now see. The PNL mixture model 
e.g. satisfies this assumption. The nonlinear mapping A may 
be rewritten as the composition 

A = loA, (36) 

where / and A model each part of the PNL mixture, as defined 
in Section Ull Assumption [3] (iii) also allows us to process the 
situation when each NL function Ai defined in Eq. ( |35] | is 
written as a linear combination of NL functions Aij defined 
from M ^ M: 

N 

In a general way. Assumption |3](iii) allows us to estimate 
mappings Ai that can be inferred from the functions Aik 
defined in Eq. ( l35T l. Assumption |2] may now be rewritten as: 
Assumption 4: (i) Source signals are mutually independent 
and (ii) by considering several single-source analysis zones 
associated with the same source, the ampUtude of the obser- 
vations spans a "wide" range allowing the estimation of the 
NL functions Aik- 

One then can see the connections between this considered 
problem and the PNL problem considered in the main part 
of this paper If a source, say Sk is isolated, then Eq. (1351 ) 
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TABLE I 

Performance of the functional clustering methods 



Filtering clustering |19 | 
with K-medians |6| 


7 


0.1 


0.5 


0.9 


knot step: 0.5 MSB 


1 1 98 
0.4539 


0364 
0.2033 


0?87 
0.1757 


knot step: 0.5 MSB 
spline order: 4 Std. 


0.1243 
0.5038 


0.0079 
0.0642 


0.0010 
0.0040 


knot step: 0.5 MSB 
spline order: 5 Std. 


0.0777 
0.3766 


0.0083 
0.0845 


0.0005 
0.0021 


knot step: 0.5 MSB 
spline order: 6 Std. 


0.6310 
1.8711 


2.1020 
4.2003 


5.5983 
12.7026 


knot step: 0.3 MSB 

cnlin*^ r\vr\(^t" S xtH 

alJllIIC UlUCl. J OLLI. 


1 0/170 

2.8847 


0.5025 


1 SQ9 

0.3810 


knot step: 0.3 MSB 
spline order: 4 Std. 


0.3813 
1.0729 


0.1840 
0.3692 


0.1790 
0.4804 


knot step: 0.3 MSB 
spline order: 5 Std. 


0.1720 
0.3546 


0.0977 
0.2868 


0.1426 
0.5043 


knot step: 0.3 MSB 
spline order: 6 Std. 


0.2406 
1.032 


0.1509 
0.8957 


0.0639 
0.2863 



Linear approximation 
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0.1 


0.5 


0.9 




1711 


MSB 


1084 


0406 

\J.\J'-T\J\J 


0763 


K = 


5 


Std. 


0.5587 


0.1689 


0.4036 


using 


|7| 


MSB 


0.1002 


0.0003 


0.0005 


K ~ 


10 


Std. 


0.5704 


0.0011 


0022 

\J a \J\J 


using 


|7| 


MSB 


0.1270 


0.0004 


0.0003 


K ~ 


15 


Std. 


0.6864 


0.0012 


0.0011 


using 


Q 


MSB 


0.1141 


0.0004 


0.0003 


K = 


20 


Std. 


0.5931 


0.0013 


0.0011 


usin^ 


PI 




1 1 ZLS 




07^0 


K = 


5 


Std. 


0.5564 


0.2377 


0.3380 


using 


i5i 


MSB 


0.0777 


0.0004 


0.0027 


K = 


10 


Std. 


0.3760 


0.0012 


0.0261 


using 


|5| 


MSB 


0.1275 


0.0004 


0.0003 


K = 


15 


Std. 


0.6881 


0.0012 


0.0011 


using 


151 


MSB 


0.1400 


0.0004 


0.0003 


K = 


20 


Std. 


0.7113 


0.0013 


0.0011 



holds and we then obtain, assuming that Aik is invertible, 

sk{t) = AT^^ {x,{t)) Vie{l,...,P},Vter. (38) 

We thus have the following relationship between observations 

xi and Xi, for all t e T'- 

x^{t) = A^k {A^k {xi{t))) = ct>,kixi{t)), (39) 
where the functions ipik are defined as: 

(t>;k{u) ^ Ak {A^k i^)) ■ (40) 

Bstimating single-source zones and clustering the scattered 
functional data in this framework can be done with the 
same approach that we defined for PNL mixtures. The main 
difficulty then consists of inverting the nonlinear functions, 
which is not in the scope of this paper. 

To demonstrate the validity of our extension, we here 
consider a toy example: we generate N ^ 2 Gaussian noise 
sources containing 10^ samples, for which we set the first 
and last 2000 samples to 0, in the first and second source 
respectively. We then mixed these sources by following the 
mixture model defined in Bq. (|37] |. with the following nonlin- 
ear functions: 

Aii{t) = A22{t) = tanh(t), Ai2{t) = A2i{t) = t. (41) 

We then used the same values as in Section IVl-BI to set 
our parameters, with the above optimal parameters for the 
clustering approaches: the filtering method was using B- 
splines of order 4, with evenly spaced knots ranging from - 
1.5 to 1.5 with a step-size of 0.5. The LTSA-based approaches 
were using K = 15 neighbors. Nonlinear functions were lastly 
approximated with B-splines of order 4, with 20 knots. The 
experiment was conducted 10 times and the MSB between the 
theoretical and estimated nonlinear functions were averaged. 
The three methods yielded almost the same performance: the 



filtering approach provided a mean MSB equal to 0.0104 while 
both LTSA-based methods provided a MSB equal to 0.0110. 
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